Independent Automatic Segmentation of Speech by Pronunciation Modeling
نویسندگان
چکیده
In this paper we present an iterative automatic segmentation system which does not require any domain dependent training data. Input to the system is the canonical pronunciation and the speech signal of an utterance to be segmented, as well as a set of phonological pronunciation rules. The output is a string of phonetic labels (SAM−PA[1]) and the corresponding segment boundaries of the speech signal. The system consists of three main parts: In a first stage a set of general phonological rules is applied to the canonical pronunciation of an utterance yielding a graph that contains the canonic form and presumed variations. In a second HMM−based stage the speech signal of the concerning utterance is time−aligned to this graph using a Viterbi search. The outcome of this stage is the time−aligned transcription of the input utterance. Using this "raw" application of the phonological rules as the baseline in a third stage, a new set of statistically weighted rules is derived. The procedure is repeated iteratively until the segmentation is not changed anymore.
منابع مشابه
Pronuncation modeling applied to automatic segmentation of spontaneous speech
In this paper two di erent models of pronunciation are presented: the rst model is based on a rule set compiled by an expert, while the second is statistically based, exploiting a survey about pronunciation variants occurring in training data. Both models generate pronunciation variants from the canonic forms of words. The two models are evaluated by applying them to the task of automatic segme...
متن کاملIndependent automatic segmentation by self-learning categorial pronunciation rules
The goal of this paper is to present a new method to automatically generate pronunciation rules for automatic segmentation of speech the German MAUSER system. MAUSER is an algorithm which generates pronunciation rules independently of any domain dependent training data either by clustering and statistically weighting self-learned rules according to a small set of phonological rules clustered by...
متن کاملPronunciation Modeling Applied to Automaticsegmentation of Spontaneous
In this paper 1 two diierent models of pronunciation are presented: the rst model is based on a rule set compiled by an expert, while the second is statistically based, exploiting a survey about pronunciation variants occurring in training data. Both models generate pronunciation variants from the canonic forms of words. The two models are evaluated by applying them to the task of automatic seg...
متن کاملA study of implicit and explicit modeling of coarticulation and pronunciation variation
In this paper, we focus on the modeling of coarticulation and pronunciation variation in Automatic Speech Recognition systems (ASR). Most ASR systems explicitly describe these production phenomena through context-dependent phoneme models and multiple pronunciation lexicons. Here, we explore the potential benefit of using feature spaces covering longer time segments in terms of implicit modeling...
متن کاملSpeech is like a box of
Pronunciation variability is present in both native and foreign words. Since pronunciation variability constitutes a problem for automatic speech recognition (ASR) systems, modeling pronunciation variation for ASR has been the topic of various studies. In most studies, modeling pronunciation variation was attempted within the standard framework used in mainstream ASR systems. Given that some as...
متن کامل